Hadoop Based Data Intensive Computation on IaaS Cloud Platforms

نویسنده

  • Sanjay P. Ahuja
چکیده

............................................................................................................................. xi Chapter 1: Introduction ....................................................................................................... 1 1.1 Cloud Platforms ........................................................................................................ 2 1.1.1 Amazon Elastic Compute Cloud (Amazon EC2) ............................................... 2 1.1.2 Amazon Elastic Map Reduce (Amazon EMR) .................................................. 4 1.2 Data Intensive Computation ..................................................................................... 6 1.2.1 Hadoop ............................................................................................................... 7 1.2.2 MapReduce ......................................................................................................... 8 1.3 Benchmarks .............................................................................................................. 9 1.3.1 HiBench Benchmarks ......................................................................................... 9 1.4 Research Objectives ............................................................................................... 12 Chapter 2: Literature Review ............................................................................................ 13 2.1 Studies Using HiBench Benchmarks ..................................................................... 13 2.2 Studies on Amazon Cloud Services vs. Other Cloud Platforms ............................ 14 Chapter 3: Research Methodology.................................................................................... 16 Chapter 4: Testbed Setup .................................................................................................. 18

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cloud Computing Technology Algorithms Capabilities in Managing and Processing Big Data in Business Organizations: MapReduce, Hadoop, Parallel Programming

The objective of this study is to verify the importance of the capabilities of cloud computing services in managing and analyzing big data in business organizations because the rapid development in the use of information technology in general and network technology in particular, has led to the trend of many organizations to make their applications available for use via electronic platforms hos...

متن کامل

A High Performance, Spatiotemporal Statistical Analysis System Based on a Spatiotemporal Cloud Platform

With the increase in size and complexity of spatiotemporal data, traditional methods for performing statistical analysis are insufficient for meeting real-time requirements for mining information from Big Data, due to both dataand computing-intensive factors. To solve the Big Data challenges in geostatistics and to support decision-making, a high performance, spatiotemporal statistical analysis...

متن کامل

An Efficient Bulk Synchronous Parallelized Scheduler for Bioinformatics Application on Public Cloud

Genomic sequence alignment of varied species is one of the most sort of applications in bioinformatics. In future bioinformatics technologies are expected to produce genomic data of terabyte. Bioinformatics computation require super computer for sequence alignment computation which involves huge cost. Parallelization technique is a way forward in computing sequence alignment with limited cost a...

متن کامل

A PERFORMANCE ANALYSIS of HADOOP CLUSTERS in OPENSTACK CLOUD and in REAL SYSTEM

Cloud computing, data and distributed systems are three important aspects of this paper. Cloud computing is being embraced by every organization and is being implemented in every field of work, be it in business or in education. Data storage and processing is fundamental task of any organization. Hadoop is a distributed framework created to handle the big data processing task. The aim of this p...

متن کامل

A Novel Spatio-Temporal Data Storage and Index Method for ARM-Based Hadoop Server

During the past decade, a vast number of GPS devices have produced massive amounts of data containing both time and spatial information. This poses a great challenge for traditional spatial databases. With the development of distributed cloud computing, many highperformance cloud platforms have been built, which can be used to process such spatio-temporal data. In this research, to store and pr...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computer and Information Science

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2015